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This  article  addresses  porting  a  code  developed  for  the  CM-5  to  a 
Cray  T3D.  A  second  article  addressing  optimization  on  the  Cray 
T3D  will  appear  in  the  next  issue. 


Driven  by  large  speed  and  memory  requirements  of  3D  computations, 
numerical  formulations  are  increasingly  adapted  for  use  with  a 
variety  of  massively  parallel  supercomputers.  However,  the  style 
of  programming  varies  from  one  architecture  to  another  and 
porting  codes  across  machines  while  maintaining  efficiency 
becomes  a  major  issue.  Recently,  the  research  group  of  Tayfun 
Tezduyar  (Professor  of  Aerospace  Engineering  and  Mechanics)  at 
the  AHPCRC  ported  the  finite  element  flow  solvers,  which  were 
originally  developed  for  the  CM-5,  to  the  Cray  T3D.  Marek  Behr, 
an  Assistant  Professor  at  the  AHPCRC,  performed  the  porting  of 
the  incompressible  flow  code  from  the  CM-5  to  the  Cray  T3D. 
Subsequently,  this  author  extended  the  code  to  the  compressible 
flow  solver.  Postdoctoral  Research  Associate,  Andrew  Johnson,  and 
graduate  student,  Vinay  Kalro,  are  also  involved  in  parallel 
finite  element  computations  on  the  Cray  T3D. 


The  Cray  T3D,  which  is  the  first  massively  parallel  supercomputer 
from  Cray  Research,  was  officially  unveiled  in  late  1993.  This 
scalable  machine  with  32  to  2048  processors  has  a  peak 
performance  from  4.8  to  over  300  GFLOPS,  with  memory  capacity  of 
512  Mbytes  to  128  Gbytes.  The  Cray  T3D  takes  advantage  of  a 
fast  bidirectional  3D  torus  network  which  minimizes  the  inter¬ 
processor  communication  times  and  ensures  short  connection  paths 
and  high  bisection  bandwidth.  The  technical  features  of  the  Cray 
T3D,  together  with  its  remarkable  stability  in  operation,  makes 
this  machine  one  of  the  state-of-the-art  systems  for  parallel 
computations . 

As  the  message-passing  programming  model  was  the  sole  model 
available  to  initial  T3D  users,  the  existing  CM  code  could  only 
be  used  as  a  guide  for  a  new  message-passing  implementation.  This 
implementation  is  still  based  on  two  distinct  main  data  storage 
modes  the  element-level  mode  and  the  equation-level  mode 
similar  to  the  CM-5  implementation.  In  order  to  minimize  inter¬ 
processor  communication,  these  two  storage  modes  have  to  be 
suitably  aligned  with  contiguous  partitions  of  elements  assigned 
to  individual  processors.  Each  processor  must  hold  the  data 
pertaining  to  both  the  element  partition  and  a  majority  of  the 
equations  associated  with  this  partition.  The  necessary 
communication  between  the  two  storage  modes  is  performed  via  two- 
step  gather  and  scatter  routines,  similar  in  functionality  to 
those  available  in  CM  scientific  libraries.  The  inter-processor 
communication  inside  the  gather  and  scatter  steps,  which  is  now 


restricted  to  partition  boundary  data,  is  accomplished  using  send 
and  receive  functions  of  the  Parallel  Virtual  Machine  (PVM) 
library.  To  improve  performance,  Cray-specific  PVM  extensions 
such  as  channels  are  also  employed.  After  a  significant  amount  of 
scalar  code  optimization,  mostly  related  to  minimizing  out-of¬ 
cache  memory  access,  the  performance  of  the  most  computationally 
intensive  parts  of  the  code  reached  the  order  of  20  MFLOPS  per 
processing  node,  and  this  is  comparable  to  the  per-node 
performance  available  on  the  CM-5.  The  advantages  of  the  T3D 
include  a  larger  per-processor  memory  and  a  smaller  communication 
cost  penalty  for  communication  intense  algorithms,  e.g.  more 
complex  preconditioners. 


Figure  1.  Dynamics  of  a  ram  air  parafoil.  The  picture  shows  the 
pressure  distribution  on  the  parafoil  surface. 

The  incompressible  flow  computations  on  the  Cray  T3D  span  from 
high-speed,  high  Reynolds  number  flows  to  low-speed  natural 
convection  flows.  The  dynamics  of  ram  air  parafoils  at  high 
Reynolds  number,  which  is  one  of  the  major  projects  at  the 
AHPCRC,  is  now  partially  simulated  on  the  Cray  T3D.  The  parafoil 
computations  on  the  Cray  T3D  include  the  steady-state  performance 
during  gliding  at  various  angles  of  attack.  The  unstructured  mesh 
used  for  these  computations  consists  of  144,649  nodes  and  905,410 
tetrahedral  elements.  Figure  1  shows  the  pressure  distribution  on 
the  parafoil  surface  at  zero  degree  angle  of  attack  and  Reynolds 
number  of  10  million.  Figure  1.  Dynamics  of  a  ram  air 
parafoil.  The  picture  shows  the  pressure  distribution  on  the 
parafoil  surface. 


Figure  2.  Aerodynamics  of  an  automobile.  The  picture  shows  the 
pressure  distribution  on  the  automobile  surface. 

Another  example  of  incompressible  flow  computations  on  the  Cray 
T3D  is  for  the  aerodynamics  of  automobiles.  Here,  airflow  past  an 
automobile  (modeled  after  a  Saturn)  at  55  miles  per  hour  is 
computed.  This  computation  is  carried  out  on  an  unstructured  mesh 
consisting  of  227,135  nodes  and  1,407,579  tetrahedral  elements 
(for  half  of  the  domain)  under  wind  tunnel  conditions.  Figure  2 
shows  the  pressure  distribution  on  the  automobile  surface.  In 
another  project  being  computed  on  the  Cray  T3D,  the  finite 
element  method  is  used  to  simulate  the  process  of  transient 
convection  in  a  volumetrically  heated  fluid.  This  computation 
requires  the  simultaneous  solution  of  the  Navier-Stokes  equations 
coupled  with  the  energy  equation.  Computations  are  carried  out  on 
a  20  X  20  X  20  structured  mesh  at  Rayleigh  number  105  and  Prandtl 
number  6.5.  Figure  3  shows  the  temperature  field  together  with 
the  mesh  on  three  sides  of  the  box  and  an  iso-surface  of 
temperature  corresponding  to  the  steady-state  solution. 


Figure  3.  Transient  convection  in  a  volumetrically  heated  fluid. 
The  picture  shows  the  temperature  field  together  with  the  mesh  on 
three  sides  of  the  box  and  an  iso-surface  of  temperature 
corresponding  to  the  steady-state  solution. 

The  compressible  flow  research  activities  on  the  Cray  T3D  include 
the  computations  of  subsonic,  transonic,  supersonic  and 
hypersonic  flows  governed  by  either  the  Euler  or  Navier-Stokes 
equations.  An  example  of  supersonic  simulations  on  the  Cray  T3D, 
which  were  carried  out  to  encourage  participation  of  AHPCRC 
undergraduate  research  assistants,  can  be  seen  in  Figure  4.  The 
picture  shows  the  temperature  distribution  on  a  fighter  aircraft 
modeled  after  the  Lockheed  YF-22.  In  this  computation,  the  free- 
stream  Mach  number  is  2  and  the  compressible  flow  is  assumed  to 
be  inviscid  and  governed  by  the  Euler  equations.  The  computation 
is  carried  out  on  an  unstructured  mesh  consisting  of  185,483 
nodes  and  1,071,580  tetrahedral  elements  (for  half  of  the 
domain) .  The  flow  simulation  for  the  aircraft  and  the  car  were 
part  of  an  effort  by  the  AHPCRC  researchers,  partially  funded  by 
the  Advanced  Research  Projects  Agency,  for  the  development  of 
scalable  libraries  for  fluid  mechanics  applications. 


Figure  4.  Supersonic  flow  past  a  fighter  aircraft.  The  simulation 
was  carried  out  to  encourage  participation  of  AHPCRC 
undergraduate  research  assistants.  The  picture  shows  the 
temperature  distribution  on  the  aircraft  surface. 

The  hypersonic  flow  computations  on  the  Cray  T3D  involve  the  air 
in  chemical  equilibrium  with  three  independent  chemical 
reactions : 
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Figure  5.  Hypersonic  flow  past  a  circular  cylinder.  The  picture 
shows  the  steady-state  temperature  distribution  generated  using 
the  ideal  gas  (left)  and  real  gas  (right)  models. 

To  test  the  accuracy  of  the  real  gas  model  with  chemical 
reactions  as  described,  the  inviscid  air  flow  at  Mach  15  past  a 
circular  cylinder  was  computed.  The  free-stream  temperature  and 
density  are  226o  K  and  0.0187  kg/m3,  respectively.  The  picture  on 
the  left  of  Figure  5  shows  the  steady-state  temperature 
distribution  generated  using  the  ideal  gas  model.  In  this  case, 
the  maximum  temperature  is  10,396o  K  which  is  off  by  91%  from  the 
experimental  value.  The  picture  on  the  right  of  Figure  5  shows 
the  steady-state  temperature  distribution  generated  using  the 
real  gas  model.  In  this  case,  the  maximum  temperature  is  5,447o  K 
which  is  in  excellent  agreement  with  experiment. 


